This Project
This web page is apart of an assignment in Emory University’s Biology 142 lab course. Students were divided into groups and were each assigned proteins of interest. They were asked to research what is known about the protein and to examine whether the newly sequenced whale shark genome had evidence of an orthologous protein. The results of this research are given below.

Background Information
RUNX3 stands for runt related transcription factor 3. It is a protein that is a part of the runt domain family of transcription factors. This runt domain family, which is comprised of RUNX1, RUNX2 and RUNX3, are regulators of gene expression in major developmental pathways (Bangsow et al. 2001). They are proteins that are made of 128 amino acids, that function in DNA binding, protein-protein interactions, and ATP binding. It binds to DNA via a protein on it called CBFβ, which then allows it to mediate transcription by activating or repressing the targeted genes (Lin et al. 2012). The RUNX3 protein is involved in the development of the gastrointestinal tract and is the shortest in the family, due to a smaller number of exons (Lund and Lhuizen 2002). Specifically, this protein controls neurogenesis in the dorsal root ganglia and cell proliferation in the gastric epithelium. It is been found that it will be frequently inactivated in human gastric cancers by inhibiting a certain signal pathway a part of gastric cancer (Lin et al. 2012) (figure 1). RUNX3 has also been purposed to be a tumor suppressor gene, which if absent or inactivated promote the cancer cells, for various other tumors and cancers, such as skin cancer (Lotem et al. 2015).
external image onc2011596f7.jpg
Figure 1. RUNX3 was found to be inactivated in human gastric cancers. This shows the mechanism that the RUNX3 protein inhibits the Akt1/β-catenin/cyclin D1 signaling pathway in gastric cancer (Lin et al. 2012).

Methods
Identifying Human Protein Sequence
Using the Ensembl website, the human protein sequence (ENSP00000308051) of RUNX3 was found in it’s FASTA format.
Whale Shark Predicted Orthologs
After finding the human protein sequence, was Blasted against the predicted whale shark protein database using the Georgia Aquarium’s Galaxy server. The sequences of the top five predicted hits (based on the lowest E-values) of the whale shark were then used as queries in a BLAST against the NCBI human protein database.
Predicted Orthologs
The predicted orthologous of RUNX3 were identified in species other than whale sharks using the NCBI Blast server. Using the species protein databases for mouse, zebra fish, Atlantic salmon, yeast, and elephant shark databases, protein BLASTs were performed against these, with the human RUNX3 protein as the query sequence.
Phylogenetic Tree
The best hit for each species that wasn’t a whale shark (based once again on the lowest E-value), along with the top 5 whale shark BLAST hits and the human protein sequence were entered into ClustalW to compare the protein sequences and to create a phylogenetic tree.

Finding RUNX3 in the whale shark
A protein BLAST of the human RUNX3 protein sequence against the whale shark was performed. The results of the top five best hits from the blast of are shown below (Table 1). These sequences were then BLASTed against the NCBI human protein database, the results of which are also shown below (table 2). As seen in the results in Table 2, none of the whale shark predicted proteins had a top hit that was RUNX3, however most of the top hits were still part of the RUNX family, with the exception of the g26895.t1 whale shark gene. Despite this, since none of the top hits were the RUNX3 gene, we were not confident that these four whale shark predicted proteins were RUNX3 orthologs.
Whale Shark ID
e-value
Alignment Length
Predicted Protein Length
% Identity
g47267.t1
4e-93
177
415
76.27
g47586.t1
1e-64
151
415
78.05
g43387.t1
1e-64
123
415
70.20
g26895.t1
1e-04
67
415
26.87
g41064.t1
1e-04
51
415
31.37
Table 1. Human RUNX3 best BLASTp hits against the whale shark predicted protein database. This was performed using the Galaxy server, with the human protein sequence as the query. The top 4 hits based on the lowest E-values are reported here with their database ID, amino acid length, alignment length and percent identity.
Whale Shark ID
Top Hit Name
E-Value
Query Coverage
Identity
g47267.t1
Runt-related transcription factor 1 isoform AML1a
1e-104
73%
76.27%
g47586.t1
Runt-related transcription factor 1 isoform AML1a
5e-62
73%
78.05%
g43387.t1
Predicted: Runt-related transcription factor 2 isoform X10
2e-64
79%
70.20%
g26895.t1
Zinc finger CCCH domain-containing protein
7e-22
91%
26.87%
g41064.t1
Predicted: Runt-related transcription factor 2 isoform X9
0.93
41%
31.37%
Table 2. The hit’s to the top 4 sequences were also used as queries against the NCBI human protein database. The top hit (according to lowest E-value) description name and E-value are reported here.

Protein Domains
With the exception of one, the rest of the five best hit predicted RUNX3 proteins in the whale shark (from BLAST results) contain just one domain, the RUNT domain (Figure 2). The RUNT domain is consisted of three genes, RUNX1, RUNX2, and RUNX3 (Cameron and Neil 2004). The domain contains a highly conserved sequence that directs the binding of RUNX proteins with its partner protein CBFβ on DNA (Levanon et al. 2003). The binding of these two proteins together allows for the RUNX genes to activate or inhibit transcription in certain genes (Lin et al. 2012).
external image wrpsb.cgi?RID=JUCWY8NS016&JUSTPIC
Figure 2. Predicted protein domain of the best-hit RUNX3 putative protein. The best hit whale shark predicted protein contains just the RUNT domain.

Orthologs
The human RUNX3 protein sequence was then compared to several other different species’ proteins. This was accomplished using the NCBI BLAST program. The human protein sequence was compared against the protein databases of mouse, zebra fish, fruit fly, yeast, and elephant shark databases. The results of the top hits are shown below (table 3). The results indicated that the proteins of the mouse, zebrafish and the elephant shark had RUNX3 orthologs. However. We are certain that the RUNX3 gene found in the mouse is a true ortholog because the protein has been previously identified and mice are still used to study the RUNX3 protein (Lin et al 2012).
Species
Top Hit Name
ID
Length
Query Coverage
E-value
Identity
Human
Runt-related transcription factor 3 isoform 2 [Homo Sapiens]
NP_004341.1
429
100%
0.0
100%
Mouse
Runt-related transcription factor 3 [Mus musculus]
NP_062706.2
423
98%
0.0
91%
Zebrafish
Runt-related transcription factor 3 [Danio rerio]
NP_571679.2
424
100%
0.0
100%
Yeast
UV-damaged DNA-binding protein RAD7 [saccrharomyces cerevisiae S288c]
NP_012586.1
565
20%
0.19
19%
Fruit Fly
RunxA, isoform C [Drosophila melanogaster]
NP_608400.2
651
35%
9e-64
71%
Elephant Shark
Predicted: runt-related transcription factor 3 [Callorhinchus milii]
XP_007902437.1
408
100%
0.0
72%
Table 3. Best hits with human RUNX3 as the query for a protein BLAST. The human protein sequence of RUNX3 was used in a protein BLAST to compare it to difference species protein databases. The species, the name of the best hit, alignment length, e-value, and percent identitiy are shown here. Species in red show the entries that had RUNX3 as the best hit.

Phylogeny
A phylogenetic tree (Figure 4) was constructed using the best hits from the different species listed in Table 3 from the BLAST searches with the human RUNX3 protein as query sequence. The two best hits, g47267.t1 and g47586.t1 (Table 1), of the predicted whale shark are shown to be the closest whale shark proteins related to humans because they share the most recent common ancestor. As opposed to g41064.t1, which shows the furthest common ancestor, making it the least related to humans. Despite having not found orthologs with these two whale shark proteins, g47267.t1 and g47586.t1, they are grouped together with humans and the all the species that were found to have RUNX3 orthologs.
external image tree_upgma.png
Figure 3. A phylogenetic tree (constructed with ClustalW) showing the best hits of the human RUNX3 protein in mice, elephant sharks, yeast, zebra fish, fruit flies, and the five best hit predicated whale sharks proteins (g47267.t1, g47586.t1, g43387.t1, g26895.t1, and g41064.t1)

Conclusion
We were not able to decisively identify a RUNX3 ortholog in whale sharks. However, we were able to find orthologs of the RUNX3 protein in the mice, zebrafish and the elephant shark. Despite having not found orthologs with our two best predicted shark proteins, they were grouped together with humans and the all the species that were found to have RUNX3 orthologs with the human's RUNX3 (Figure 3). This suggests that the whale shark has some form or part of the RUNX3 gene. It could also mean that the other members of the RUNT family are not that different in structure, despite their different functions. More research can be done, to see what exactly is different between these orthologs, humans and the two g47267.t1 and g47586.t1 whale shark genes, such as the specific difference in their sequences. It is evident, however, that the whale sharks do have some form of the RUNT domain transcription factors because their top hits were RUNX1 and RUNX2 proteins. Research shows that despite being in the same domain, the three RUNX3 proteins all have different functions in the body. RUNX1 and RUNX2 have found to be important in organogenesis and are also associated with other cancers (Bangsow, et al. 2001). Further research should be done to further research to differentiate the three RUNX proteins. Furthermore, what about the RUNX1 and RUNX2 proteins make them seemingly more prevalent in the whale shark should also be investigated because all three RUNX genes play important roles in cancer and other human disease research.

Resources:
Ensembl Genome Sequence Extractor: http://www.ensembl.org/index.html
NCBI BLAST Website: http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome
Whale Shark Galaxy Server: http://whaleshark.georgiaaquarium.org/
CLUSTALW Phylogenetic Tree: http://www.genome.jp/tools/clustalw/

References
  1. Bangsow, Carmen, Nir Rubins, Gustavo Glusman, Yael Bernstein, Varda Negreanu, Dalia Goldenberg, Joseph Lotem, Edna Ben-Asher, Doron Lancet, Ditsa Levanon, and Yoram Groner. "The RUNX3 Gene – Sequence, Structure and Regulated Expression." Gene 279.2 (2001): 221-32. Print.
  2. Cameron, Ewan R, and James C Neil. "The Runx Genes: Lineage-specific Oncogenes and Tumor Suppressors." Oncogene 23.24 (2004): 4308-314. Print.
  3. Levanon, Ditsa, Ori Brenner, Florian Otto, and Yoram Groner. "Runx3 Knockouts and Stomach Cancer." EMBO Reports 4.6 (2003): 560-64. Print.
  4. Li, Qing-Lin, Kosei Ito, Chohei Sakakura, Hiroshi Fukamachi, Ken-Ichi Inoue, Xin-Zi Chi, Kwang-Youl Lee, Shintaro Nomura, Chang-Woo Lee, Sang-Bae Han, Hwan-Mook Kim, Wun-Jae Kim, Hiromitsu Yamamoto, Namiko Yamashita, Takashi Yano, Toshio Ikeda, Shigeyoshi Itohara, Johji Inazawa, Tatsuo Abe, Akeo Hagiwara, Hisakazu Yamagishi, Asako Ooe, Atsushi Kaneda, Takashi Sugimura, Toshikazu Ushijima, Suk-Chul Bae, and Yoshiaki Ito. "Causal Relationship between the Loss of RUNX3 Expression and Gastric Cancer." Cell 109.1 (2002): 113-24. Print.
  5. Lin, F-C, Y-P Liu, C-H Lai, Y-S Shan, H-C Cheng, P-I Hsu, C-H Lee, Y-C Lee, H-Y Wang, C-H Wang, J Q Cheng, M. Hsiao, and P-J Lu. "RUNX3-mediated Transcriptional Inhibition of Akt Suppresses Tumorigenesis of Human Gastric Cancer Cells." Oncogene 31.39 (2012): 4302–4316. Print.
  6. Lotem, Joseph, Ditsa Levanon, Varda Negreanu, Omri Bauer, Shay Hantisteanu, Joseph Dicken, and Yoram Groner. "Runx3 at the Interface of Immunity, Inflammation and Cancer." Biochimica Et Biophysica Acta (BBA) - Reviews on Cancer 1855.2 (2015): 131-43. Print.
  7. Lund, Anders H, and Maarten Van Lohuizen. "RUNX: A Trilogy of Cancer Genes." Cancer Cell 1.3 (2002): 213-15. Print.